We would like to conduct a linear regression on delay time in minutes. In this case, we would need to check if the interactions between some continuous predictors and airlines, months, periods of time, respectively, were significantly associated with the total delay time in minutes. To achieve it, we did some visualizations. For tidiness of visualizations, we have adjusted the range of axis.


The continuous predictors included:


cont_airline = function(cont){
  
  airline = raw_df %>% 
    mutate(
      text_label = str_c("Airline: ", airline)
    ) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~airline,
            text = ~text_label, hoverinfo = "text",
            type = "scatter", mode = "markers", alpha = .5)
}

cont_month = function(cont){
  month = raw_df %>%
    mutate(
      text_label = str_c("Month: ", month),
      month = fct_reorder(month, date)) %>% 
    plot_ly(x = ~cont, y = ~delay, color = ~month,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}
  
cont_hour = function(cont){
  hour = raw_df %>% 
    mutate(
      text_label = str_c("Period: ", hour_c)) %>% 
        plot_ly(x = ~cont, y = ~delay, color = ~hour_c,
          text = ~text_label, hoverinfo = "text",
          type = "scatter", mode = "markers", alpha = .5)
}

Interaction for Continuous Predictors

Types of Delay

Carrier Delay

Extreme Weather Delay

Late Arrival Delay

NAS Delay

Security Delay

Weather Specific

Temperature

Humidity

Visibility

Wind Speed


Interpretation

Based on the graphs, we found that there could be significant interactions between:

  1. Carrier Delay * Airline

  2. Temperature * Month

For the following statistical analysis, we would focus on these interaction terms to see if they are necessary to be included in our linear regression model.